Let’s first load the diamonds dataset.
data("diamonds")
head(diamonds)
## # A tibble: 6 × 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
Bar Plot: Great for categorical data. Scatter Plot: Perfect for two numerical variables. Histogram: Ideal for showing the distribution of one numeric variable.
Practice 1: Bar Plot (Categorical Data) Let’s start simple. A bar plot is great for comparing categories. Create a ggplot bar chart comparing the counts of diamonds by their cut quality.
# Create a ggplot bar chart
bar_plot <- ggplot(diamonds, aes(x = cut)) +
geom_bar() +
ggtitle("Diamonds by Cut Quality") +
theme_minimal()
bar_plot
Question: What do you notice about the distribution of cuts? Seems like more people go for ‘Ideal’ cuts!
Now, let’s add some bling by converting this into a plotly plot.
# Convert to interactive plot
ggplotly(bar_plot)
Look! You can now hover over the bars and see exactly how many diamonds there are in each category. Much shinier!
Practice 2: Scatter Plot (Two Numeric Variables) Scatter plots are perfect for showing relationships between two numerical variables. Let’s check out the relationship between carat (weight) and price. Is bigger always better?
# Create a ggplot scatter plot
scatter_plot <- ggplot(diamonds, aes(x = carat, y = price)) +
geom_point(alpha = 0.5, color = "blue") +
ggtitle("Carat vs Price") +
theme_minimal()
scatter_plot
Now, convert this to a Plotly plot so you can zoom in and get up close with those high-priced diamonds!
# Convert to interactive plot
ggplotly(scatter_plot)
Question: What can you infer from this scatter plot? Does it seem like bigger diamonds (higher carats) cost more? But notice the steep jump in prices for certain diamonds.
Practice 3: Histogram (Distribution of Numeric Data) Histograms help you see the distribution of a single numeric variable. Let’s check out the distribution of diamond prices. Ready to be shocked?
# Create a ggplot histogram
histogram <- ggplot(diamonds, aes(x = price)) +
geom_histogram(binwidth = 1000, fill = "green", color = "black") +
ggtitle("Distribution of Diamond Prices") +
theme_minimal()
histogram
Make it interactive to explore those outlier prices!
# Convert to interactive plot
ggplotly(histogram)
Challenge: Try adjusting the binwidth in the histogram and see how it changes the shape of the distribution. What happens when you make the binwidth smaller or larger?
Bonus Practice: Add More Bling (Customization) The cool thing about Plotly is that you can keep customizing. Let’s spice up our scatter plot by adding color based on the diamond’s clarity.
# Create a customized ggplot scatter plot
scatter_plot_colored <- ggplot(diamonds, aes(x = carat, y = price, color = clarity)) +
geom_point(alpha = 0.5) +
ggtitle("Carat vs Price (Colored by Clarity)") +
theme_minimal()
# Convert to interactive plot
ggplotly(scatter_plot_colored)
Now you can see the relationship between price, carat, and clarity interactively! Hover over the points to discover the clarity of those shiny diamonds. 7.Faceting: Viewing Data Across Categories Faceting is an effective way to create separate panels within a single plot for different categories. For instance, if you want to examine how diamond prices vary across different levels of clarity, faceting can help.
Let’s create a faceted scatter plot showing the relationship between carat and price, separated by diamond
# Create a ggplot scatter plot with facets
facet_plot <- ggplot(diamonds, aes(x = carat, y = price)) +
geom_point(alpha = 0.3, color = "purple") +
facet_wrap(~ cut) +
ggtitle("Carat vs Price Faceted by Cut") +
theme_minimal()
facet_plot
Now, let’s make this faceted plot interactive
# Convert to interactive plot
ggplotly(facet_plot)
# Create a scatter plot with a trend line
trend_plot <- ggplot(diamonds, aes(x = carat, y = price)) +
geom_point(alpha = 0.3, color = "blue") +
geom_smooth(method = "lm", color = "red", se = FALSE) +
ggtitle("Carat vs Price with Trend Line") +
theme_minimal()
trend_plot
## `geom_smooth()` using formula = 'y ~ x'
Convert this to an interactive plot:
# Convert to interactive plot
ggplotly(trend_plot)
## `geom_smooth()` using formula = 'y ~ x'
In this tutorial, you learned how to: